Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
نویسندگان
چکیده
Contrary to popular belief, we show that the optimal parameters for IBM Model 1 are not unique. We demonstrate that, for a large class of words, IBM Model 1 is indifferent among a continuum of ways to allocate probability mass to their translations. We study the magnitude of the variance in optimal model parameters using a linear programming approach as well as multiple random trials, and demonstrate that it results in variance in test set log-likelihood and alignment error rate.
منابع مشابه
Mixed Linear Regression with Multiple Components
In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is locally strongly convex in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexi...
متن کاملThe IBM Mixture Models 1 and 2 for Word Alignment
This is a tutorial on the IBM models 1 and 2 for word alignment. In contrast to many other presentations, I motivate the models from a mixture model rather than from a translation perspective. This view makes it easier to derive the EM algorithms for learning and to understand why the likelihood function of the models usually has multiple optima.
متن کاملStrict convexity of the free energy for non - convex gradient models at moderate β
We consider a gradient interface model on the lattice with interaction potential which is a non-convex perturbation of a convex potential. We show using a one-step multiple scale analysis the strict convexity of the surface tension at high temperature. This is an extension of Funaki and Spohn’s result [10], where the strict convexity of potential was crucial in their proof. AMS 2000 Subject Cla...
متن کاملfür Mathematik in den Naturwissenschaften Leipzig Strict convexity of the free energy for non - convex
We consider a gradient interface model on the lattice with interaction potential which is a non-convex perturbation of a convex potential. We show using a one-step multiple scale analysis the strict convexity of the surface tension at high temperature. This is an extension of Funaki and Spohn’s result [10], where the strict convexity of potential was crucial in their proof. AMS 2000 Subject Cla...
متن کاملA Convex Alternative to IBM Model 2
The IBM translation models have been hugely influential in statistical machine translation; they are the basis of the alignment models used in modern translation systems. Excluding IBM Model 1, the IBM translation models, and practically all variants proposed in the literature, have relied on the optimization of likelihood functions or similar functions that are non-convex, and hence have multi...
متن کامل